Pesquisa | Portal Regional da BVS

1.

A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography.

Bukhman, Yury V; Morin, Phillip A; Meyer, Susanne; Chu, Li-Fang; Jacobsen, Jeff K; Antosiewicz-Bourget, Jessica; Mamott, Daniel; Gonzales, Maylie; Argus, Cara; Bolin, Jennifer; Berres, Mark E; Fedrigo, Olivier; Steill, John; Swanson, Scott A; Jiang, Peng; Rhie, Arang; Formenti, Giulio; Phillippy, Adam M; Harris, Robert S; Wood, Jonathan M D; Howe, Kerstin; Kirilenko, Bogdan M; Munegowda, Chetan; Hiller, Michael; Jain, Aashish; Kihara, Daisuke; Johnston, J Spencer; Ionkov, Alexander; Raja, Kalpana; Toh, Huishi; Lang, Aimee; Wolf, Magnus; Jarvis, Erich D; Thomson, James A; Chaisson, Mark J P; Stewart, Ron.

Mol Biol Evol ; 41(3)2024 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-38376487

RESUMO

The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.

Assuntos

Balaenoptera , Neoplasias , Animais , Balaenoptera/genética , Duplicações Segmentares Genômicas , Genoma , Demografia , Neoplasias/genética

2.

The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes.

Makova, Kateryna D; Pickett, Brandon D; Harris, Robert S; Hartley, Gabrielle A; Cechova, Monika; Pal, Karol; Nurk, Sergey; Yoo, DongAhn; Li, Qiuhui; Hebbar, Prajna; McGrath, Barbara C; Antonacci, Francesca; Aubel, Margaux; Biddanda, Arjun; Borchers, Matthew; Bomberg, Erich; Bouffard, Gerard G; Brooks, Shelise Y; Carbone, Lucia; Carrel, Laura; Carroll, Andrew; Chang, Pi-Chuan; Chin, Chen-Shan; Cook, Daniel E; Craig, Sarah J C; de Gennaro, Luciana; Diekhans, Mark; Dutra, Amalia; Garcia, Gage H; Grady, Patrick G S; Green, Richard E; Haddad, Diana; Hallast, Pille; Harvey, William T; Hickey, Glenn; Hillis, David A; Hoyt, Savannah J; Jeong, Hyeonsoo; Kamali, Kaivan; Kosakovsky Pond, Sergei L; LaPolice, Troy M; Lee, Charles; Lewis, Alexandra P; Loh, Yong-Hwee E; Masterson, Patrick; McCoy, Rajiv C; Medvedev, Paul; Miga, Karen H; Munson, Katherine M; Pak, Evgenia.

bioRxiv ; 2023 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-38077089

RESUMO

Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.

3.

Accurate sequencing of DNA motifs able to form alternative (non-B) structures.

Weissensteiner, Matthias H; Cremona, Marzia A; Guiblet, Wilfried M; Stoler, Nicholas; Harris, Robert S; Cechova, Monika; Eckert, Kristin A; Chiaromonte, Francesca; Huang, Yi-Fei; Makova, Kateryna D.

Genome Res ; 33(6): 907-922, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37433640

RESUMO

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.

Assuntos

DNA Forma Z , Nanoporos , Humanos , Motivos de Nucleotídeos , Análise de Sequência de DNA , DNA/genética , Composição de Bases , Sequenciamento de Nucleotídeos em Larga Escala

4.

Whole-genome sequence and assembly of the Javan gibbon (Hylobates moloch).

Escalona, Merly; VanCampen, Jake; Maurer, Nicholas W; Haukness, Marina; Okhovat, Mariam; Harris, Robert S; Watwood, Allison; Hartley, Gabrielle A; O'Neill, Rachel J; Medvedev, Paul; Makova, Kateryna D; Vollmers, Christopher; Carbone, Lucia; Green, Richard E.

J Hered ; 114(1): 35-43, 2023 03 16.

Artigo em Inglês | MEDLINE | ID: mdl-36146896

RESUMO

The Javan gibbon, Hylobates moloch, is an endangered gibbon species restricted to the forest remnants of western and central Java, Indonesia, and one of the rarest of the Hylobatidae family. Hylobatids consist of 4 genera (Holoock, Hylobates, Symphalangus, and Nomascus) that are characterized by different numbers of chromosomes, ranging from 38 to 52. The underlying cause of this karyotype plasticity is not entirely understood, at least in part, due to the limited availability of genomic data. Here we present the first scaffold-level assembly for H. moloch using a combination of whole-genome Illumina short reads, 10X Chromium linked reads, PacBio, and Oxford Nanopore long reads and proximity-ligation data. This Hylobates genome represents a valuable new resource for comparative genomics studies in primates.

Assuntos

Genoma , Hylobates , Animais , Hylobates/genética , Florestas , Espécies em Perigo de Extinção , Indonésia

5.

The minimizer Jaccard estimator is biased and inconsistent.

Belbasi, Mahdi; Blanca, Antonio; Harris, Robert S; Koslicki, David; Medvedev, Paul.

Bioinformatics ; 38(Suppl 1): i169-i176, 2022 06 24.

Artigo em Inglês | MEDLINE | ID: mdl-35758786

RESUMO

MOTIVATION: Sketching is now widely used in bioinformatics to reduce data size and increase data processing speed. Sketching approaches entice with improved scalability but also carry the danger of decreased accuracy and added bias. In this article, we investigate the minimizer sketch and its use to estimate the Jaccard similarity between two sequences. RESULTS: We show that the minimizer Jaccard estimator is biased and inconsistent, which means that the expected difference (i.e. the bias) between the estimator and the true value is not zero, even in the limit as the lengths of the sequences grow. We derive an analytical formula for the bias as a function of how the shared k-mers are laid out along the sequences. We show both theoretically and empirically that there are families of sequences where the bias can be substantial (e.g. the true Jaccard can be more than double the estimate). Finally, we demonstrate that this bias affects the accuracy of the widely used mashmap read mapping tool. AVAILABILITY AND IMPLEMENTATION: Scripts to reproduce our experiments are available at https://github.com/medvedevgroup/minimizer-jaccard-estimator/tree/main/reproduce. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software

6.

The Statistics of k-mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches.

Blanca, Antonio; Harris, Robert S; Koslicki, David; Medvedev, Paul.

J Comput Biol ; 29(2): 155-168, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-35108101

RESUMO

k-mer-based methods are widely used in bioinformatics, but there are many gaps in our understanding of their statistical properties. Here, we consider the simple model where a sequence S (e.g., a genome or a read) undergoes a simple mutation process through which each nucleotide is mutated independently with some probability r, under the assumption that there are no spurious k-mer matches. How does this process affect the k-mers of S? We derive the expectation and variance of the number of mutated k-mers and of the number of islands (a maximal interval of mutated k-mers) and oceans (a maximal interval of nonmutated k-mers). We then derive hypothesis tests and confidence intervals (CIs) for r given an observed number of mutated k-mers, or, alternatively, given the Jaccard similarity (with or without MinHash). We demonstrate the usefulness of our results using a few select applications: obtaining a CI to supplement the Mash distance point estimate, filtering out reads during alignment by Minimap2, and rating long-read alignments to a de Bruijn graph by Jabba.

Assuntos

Mutação , Análise de Sequência de DNA/estatística & dados numéricos , Algoritmos , Sequência de Bases , Biologia Computacional , Intervalos de Confiança , Genômica/estatística & dados numéricos , Humanos , Modelos Genéticos , Alinhamento de Sequência/estatística & dados numéricos , Software

7.

Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome.

Guiblet, Wilfried M; Cremona, Marzia A; Harris, Robert S; Chen, Di; Eckert, Kristin A; Chiaromonte, Francesca; Huang, Yi-Fei; Makova, Kateryna D.

Nucleic Acids Res ; 49(3): 1497-1516, 2021 02 22.

Artigo em Inglês | MEDLINE | ID: mdl-33450015

RESUMO

Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.

Assuntos

DNA/química , Variação Genética , Genoma Humano , Animais , Loci Gênicos , Humanos , Taxa de Mutação , Polimorfismo de Nucleotídeo Único , Pongo pygmaeus

8.

Progressive Cactus is a multiple-genome aligner for the thousand-genome era.

Armstrong, Joel; Hickey, Glenn; Diekhans, Mark; Fiddes, Ian T; Novak, Adam M; Deran, Alden; Fang, Qi; Xie, Duo; Feng, Shaohong; Stiller, Josefin; Genereux, Diane; Johnson, Jeremy; Marinescu, Voichita Dana; Alföldi, Jessica; Harris, Robert S; Lindblad-Toh, Kerstin; Haussler, David; Karlsson, Elinor; Jarvis, Erich D; Zhang, Guojie; Paten, Benedict.

Nature ; 587(7833): 246-251, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-33177663

RESUMO

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.

Assuntos

Genoma/genética , Genômica/métodos , Alinhamento de Sequência/métodos , Software , Vertebrados/genética , Âmnio , Animais , Simulação por Computador , Genômica/normas , Haplótipos , Humanos , Controle de Qualidade , Alinhamento de Sequência/normas , Software/normas

9.

Dynamic evolution of great ape Y chromosomes.

Cechova, Monika; Vegesna, Rahulsimham; Tomaszkiewicz, Marta; Harris, Robert S; Chen, Di; Rangavittal, Samarth; Medvedev, Paul; Makova, Kateryna D.

Proc Natl Acad Sci U S A ; 117(42): 26273-26280, 2020 10 20.

Artigo em Inglês | MEDLINE | ID: mdl-33020265

RESUMO

The mammalian male-specific Y chromosome plays a critical role in sex determination and male fertility. However, because of its repetitive and haploid nature, it is frequently absent from genome assemblies and remains enigmatic. The Y chromosomes of great apes represent a particular puzzle: their gene content is more similar between human and gorilla than between human and chimpanzee, even though human and chimpanzee share a more recent common ancestor. To solve this puzzle, here we constructed a dataset including Ys from all extant great ape genera. We generated assemblies of bonobo and orangutan Ys from short and long sequencing reads and aligned them with the publicly available human, chimpanzee, and gorilla Y assemblies. Analyzing this dataset, we found that the genus Pan, which includes chimpanzee and bonobo, experienced accelerated substitution rates. Pan also exhibited elevated gene death rates. These observations are consistent with high levels of sperm competition in Pan Furthermore, we inferred that the great ape common ancestor already possessed multicopy sequences homologous to most human and chimpanzee palindromes. Nonetheless, each species also acquired distinct ampliconic sequences. We also detected increased chromatin contacts between and within palindromes (from Hi-C data), likely facilitating gene conversion and structural rearrangements. Our results highlight the dynamic mode of Y chromosome evolution and open avenues for studies of male-specific dispersal in endangered great ape species.

Assuntos

Hominidae/genética , Cromossomo Y/genética , Animais , Evolução Biológica , Evolução Molecular , Conversão Gênica , Gorilla gorilla/genética , Humanos , Pan paniscus/genética , Pan troglodytes/genética , Pongo/genética , Análise de Sequência de DNA

10.

Improved representation of sequence bloom trees.

Harris, Robert S; Medvedev, Paul.

Bioinformatics ; 36(3): 721-727, 2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-31504157

RESUMO

MOTIVATION: Algorithmic solutions to index and search biological databases are a fundamental part of bioinformatics, providing underlying components to many end-user tools. Inexpensive next generation sequencing has filled publicly available databases such as the Sequence Read Archive beyond the capacity of traditional indexing methods. Recently, the Sequence Bloom Tree (SBT) and its derivatives were proposed as a way to efficiently index such data for queries about transcript presence. RESULTS: We build on the SBT framework to construct the HowDe-SBT data structure, which uses a novel partitioning of information to reduce the construction and query time as well as the size of the index. Compared to previous SBT methods, on real RNA-seq data, HowDe-SBT can construct the index in less than 36% of the time and with 39% less space and can answer small-batch queries at least five times faster. We also develop a theoretical framework in which we can analyze and bound the space and query performance of HowDe-SBT compared to other SBT methods. AVAILABILITY AND IMPLEMENTATION: HowDe-SBT is available as a free open source program on https://github.com/medvedevgroup/HowDeSBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Árvores , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA

11.

High Satellite Repeat Turnover in Great Apes Studied with Short- and Long-Read Technologies.

Cechova, Monika; Harris, Robert S; Tomaszkiewicz, Marta; Arbeithuber, Barbara; Chiaromonte, Francesca; Makova, Kateryna D.

Mol Biol Evol ; 36(11): 2415-2431, 2019 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-31273383

RESUMO

Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions.

12.

Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data.

Harris, Robert S; Cechova, Monika; Makova, Kateryna D.

Bioinformatics ; 35(22): 4809-4811, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-31290946

RESUMO

SUMMARY: Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response. AVAILABILITY AND IMPLEMENTATION: NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano , Humanos , Nanoporos , Análise de Sequência de DNA , Software , Sequências de Repetição em Tandem

13.

Retrovirus insertion site analysis of LGL leukemia patient genomes.

Li, Weiling; Yang, Lei; Harris, Robert S; Lin, Lin; Olson, Thomas L; Hamele, Cait E; Feith, David J; Loughran, Thomas P; Poss, Mary.

BMC Med Genomics ; 12(1): 88, 2019 06 17.

Artigo em Inglês | MEDLINE | ID: mdl-31208405

RESUMO

BACKGROUND: Large granular lymphocyte (LGL) leukemia is an uncommon cancer characterized by sustained clonal proliferation of LGL cells. Antibodies reactive to retroviruses have been documented in the serum of patients with LGL leukemia. Culture or molecular approaches have to date not been successful in identifying a retrovirus. METHODS: Because a retrovirus must integrate into the genome of an infected cell, we focused our efforts on detecting a novel retrovirus integration site in the clonally expanded LGL cells. We present a new computational tool that uses long-insert mate pair sequence data to search the genome of LGL leukemia cells for retrovirus integration sites. We also utilize recently published methods to interrogate the status of polymorphic human endogenous retrovirus type K (HERV-K) provirus in patient genomes. RESULTS: Our data show that there are no new retrovirus insertions in LGL genomes of LGL leukemia patients. However, our insertion call tool did detect four HERV-K provirus integration sites that are polymorphic in the human population but absent from the human reference genome, hg19. To determine if the prevalence of these or other polymorphic proviral HERV-Ks differed between LGL leukemia patients and the general population, we used a recently developed tool that reports sites in the human genome occupied by a known proviral HERV-K. We report that there are significant differences in the number of polymorphic HERV-Ks in the genomes of LGL leukemia patients of European origin compared to individuals with European ancestry in the 1000 genomes (KGP) data. CONCLUSIONS: Our study confirms that the clonal expansion of LGL cells in LGL leukemia is not driven by the integration of a new infectious or endogenous retrovirus, although we do not rule out that these cells are responding to retroviral antigens produced in other cell types. However, our computational analyses revealed that the genomes of LGL leukemia patients carry a higher burden of polymorphic HERV-K proviruses compare to individuals from KGP of European ancestry. Our research emphasizes the merits of comprehensive genomic assessment of HERV-K in cancer samples and suggests that further analyses to determine contributions of HERV-K to LGL leukemia are warranted.

Assuntos

Genoma Humano/genética , Leucemia Linfocítica Granular Grande/genética , Leucemia Linfocítica Granular Grande/virologia , Provírus/fisiologia , Retroviridae/fisiologia , Integração Viral/genética , Humanos

14.

Elevation in lung volume and preventing catastrophic airway closure in asthmatics during bronchoconstriction.

Osorio-Valencia, Juan S; Wongviriyawong, Chanikarn; Winkler, Tilo; Kelly, Vanessa J; Harris, Robert S; Venegas, Jose G.

PLoS One ; 13(12): e0208337, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30566496

RESUMO

BACKGROUND: Asthma exacerbations cause lung hyperinflation, elevation in load to inspiratory muscles, and decreased breathing capacity that, in severe cases, may lead to inspiratory muscle fatigue and respiratory failure. Hyperinflation has been attributed to a passive mechanical origin; a respiratory system time-constant too long for full exhalation. However, because the increase in volume is also concurrent with activation of inspiratory muscles during exhalation it is unclear whether hyperinflation in broncho-constriction is a passive phenomenon or is actively controlled to avoid airway closure. METHODS: Using CT scanning, we measured the distensibility of individual segmental airways relative to that of their surrounding parenchyma in seven subjects with asthma and nine healthy controls. With this data we tested whether the elevation of lung volume measured after methacholine (MCh) provocation was associated with airway narrowing, or to the volume required to preventing airway closure. We also tested whether the reduction in FVC post-MCh could be attributed to gas trapped behind closed segmental airways. FINDINGS: The changes in lung volume by MCh in subjects with and without asthma were inversely associated with their reduction in average airway lumen. This finding would be inconsistent with hyperinflation by passive elevation of airway resistance. In contrast, the change in volume of each subject was associated with the lung volume estimated to cause the closure of the least stable segmental airway of his/her lungs. In addition, the measured drop in FVC post MCh was associated with the estimated volume of gas trapped behind closed segmental airways at RV. CONCLUSIONS: Our data supports the concept that hyperinflation caused by MCh-induced bronchoconstriction is the result of an actively controlled process where parenchymal distending forces on airways are increased to counteract their closure. To our knowledge, this is the first imaging-based study that associates inter-subject differences in whole lung behavior with the interdependence between individual airways and their surrounding parenchyma.

Assuntos

Asma/tratamento farmacológico , Asma/fisiopatologia , Broncoconstrição/efeitos dos fármacos , Adulto , Resistência das Vias Respiratórias/efeitos dos fármacos , Broncoconstritores/uso terapêutico , Feminino , Humanos , Pulmão/efeitos dos fármacos , Pulmão/fisiologia , Medidas de Volume Pulmonar , Masculino , Modelos Teóricos , Volume de Ventilação Pulmonar/efeitos dos fármacos , Adulto Jovem

15.

Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.

Guiblet, Wilfried M; Cremona, Marzia A; Cechova, Monika; Harris, Robert S; Kejnovská, Iva; Kejnovsky, Eduard; Eckert, Kristin; Chiaromonte, Francesca; Makova, Kateryna D.

Genome Res ; 28(12): 1767-1778, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-30401733

RESUMO

DNA conformation may deviate from the classical B-form in â¼13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.

Assuntos

DNA/química , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Conformação de Ácido Nucleico , Análise de Sequência de DNA , Replicação do DNA , Quadruplex G , Genômica/métodos , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Cinética , Mutação , Motivos de Nucleotídeos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos

16.

The effect of emphysema on readmission and survival among smokers with heart failure.

Kohli, Puja; Staziaki, Pedro V; Janjua, Sumbal A; Addison, Daniel A; Hallett, Travis R; Hennessy, Orla; Takx, Richard A P; Lu, Michael T; Fintelmann, Florian J; Semigran, Marc; Harris, Robert S; Celli, Bartolome R; Hoffmann, Udo; Neilan, Tomas G.

PLoS One ; 13(7): e0201376, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30059544

RESUMO

Heart Failure (HF) and chronic obstructive pulmonary disease (COPD) are morbid diseases that often coexist. In patients with coexisting disease, COPD is an independent risk factor for readmission and mortality. However, spirometry is often inaccurate in those with active heart failure. Therefore, we investigated the association between the presence of emphysema on computed tomography (CT) and readmission rates in smokers admitted with heart failure (HF). The cohort included a consecutive group of smokers discharged with HF from a tertiary center between January 1, 2014 and April 1, 2014 who also had a CT of the chest for dyspnea. The primary endpoint was any readmission for HF before April 1, 2016; secondary endpoints were 30-day readmission for HF, length of stay and all-cause mortality. Over the study period, there were 225 inpatient smokers with HF who had a concurrent chest CT (155 [69%] males, age 69±11 years, ejection fraction [EF] 46±18%, 107 [48%] LVEF of < 50%). Emphysema on CT was present in 103 (46%) and these were older, had a lower BMI, more pack-years, less diabetes and an increased afterload. During a follow-up of 2.1 years, there were 110 (49%) HF readmissions and 55 (24%) deaths. When separated by emphysema on CT, any readmission, 30-day readmission, length of stay and mortality were higher among HF patients with emphysema. In multivariable regression, emphysema by CT was associated with a two-fold higher (adjusted HR 2.11, 95% CI 1.41-3.15, p < 0.001) risk of readmission and a trend toward increased mortality (adjusted HR 1.70 95% CI 0.86-3.34, p = 0.12). In conclusion, emphysema by CT is a frequent finding in smokers hospitalized with HF and is associated with adverse outcomes in HF. This under recognized group of patients with both emphysema and heart failure may benefit from improved recognition and characterization of their co-morbid disease processes and optimization of therapies for their lung disease.

Assuntos

Insuficiência Cardíaca/mortalidade , Readmissão do Paciente , Doença Pulmonar Obstrutiva Crônica/mortalidade , Sistema de Registros , Fumar/mortalidade , Idoso , Idoso de 80 Anos ou mais , Intervalo Livre de Doença , Feminino , Insuficiência Cardíaca/complicações , Insuficiência Cardíaca/diagnóstico por imagem , Insuficiência Cardíaca/terapia , Humanos , Masculino , Pessoa de Meia-Idade , Doença Pulmonar Obstrutiva Crônica/complicações , Doença Pulmonar Obstrutiva Crônica/diagnóstico por imagem , Doença Pulmonar Obstrutiva Crônica/terapia , Estudos Retrospectivos , Fatores de Risco , Fumar/efeitos adversos , Fumar/terapia , Taxa de Sobrevida , Tomografia Computadorizada por Raios X

17.

Deterioration of Regional Lung Strain and Inflammation during Early Lung Injury.

Motta-Ribeiro, Gabriel C; Hashimoto, Soshi; Winkler, Tilo; Baron, Rebecca M; Grogg, Kira; Paula, Luís F S C; Santos, Arnoldo; Zeng, Congli; Hibbert, Kathryn; Harris, Robert S; Bajwa, Ednan; Vidal Melo, Marcos F.

Am J Respir Crit Care Med ; 198(7): 891-902, 2018 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-29787304

RESUMO

RATIONALE: The contribution of aeration heterogeneity to lung injury during early mechanical ventilation of uninjured lungs is unknown. OBJECTIVES: To test the hypotheses that a strategy consistent with clinical practice does not protect from worsening in lung strains during the first 24 hours of ventilation of initially normal lungs exposed to mild systemic endotoxemia in supine versus prone position, and that local neutrophilic inflammation is associated with local strain and blood volume at global strains below a proposed injurious threshold. METHODS: Voxel-level aeration and tidal strain were assessed by computed tomography in sheep ventilated with low Vt and positive end-expiratory pressure while receiving intravenous endotoxin. Regional inflammation and blood volume were estimated from 2-deoxy-2-[(18)F]fluoro-d-glucose (18F-FDG) positron emission tomography. MEASUREMENTS AND MAIN RESULTS: Spatial heterogeneity of aeration and strain increased only in supine lungs (P < 0.001), with higher strains and atelectasis than prone at 24 hours. Absolute strains were lower than those considered globally injurious. Strains redistributed to higher aeration areas as lung injury progressed in supine lungs. At 24 hours, tissue-normalized 18F-FDG uptake increased more in atelectatic and moderately high-aeration regions (>70%) than in normally aerated regions (P < 0.01), with differential mechanistically relevant regional gene expression. 18F-FDG phosphorylation rate was associated with strain and blood volume. Imaging findings were confirmed in ventilated patients with sepsis. CONCLUSIONS: Mechanical ventilation consistent with clinical practice did not generate excessive regional strain in heterogeneously aerated supine lungs. However, it allowed worsening of spatial strain distribution in these lungs, associated with increased inflammation. Our results support the implementation of early aeration homogenization in normal lungs.

Assuntos

Lesão Pulmonar Aguda/patologia , Atelectasia Pulmonar/etiologia , Respiração Artificial/efeitos adversos , Síndrome do Desconforto Respiratório/etiologia , Lesão Pulmonar Aguda/diagnóstico por imagem , Lesão Pulmonar Aguda/etiologia , Análise de Variância , Animais , Biópsia por Agulha , Gasometria , Modelos Animais de Doenças , Endotoxemia/etiologia , Endotoxemia/fisiopatologia , Endotoxinas/farmacologia , Feminino , Fluordesoxiglucose F18 , Humanos , Imuno-Histoquímica , Infusões Intravenosas , Modelos Lineares , Análise Multivariada , Tomografia por Emissão de Pósitrons/métodos , Atelectasia Pulmonar/diagnóstico por imagem , Distribuição Aleatória , Respiração Artificial/métodos , Síndrome do Desconforto Respiratório/diagnóstico por imagem , Síndrome do Desconforto Respiratório/patologia , Testes de Função Respiratória , Fatores de Risco , Ovinos , Volume de Ventilação Pulmonar/fisiologia , Fatores de Tempo , Tomografia Computadorizada por Raios X/métodos

18.

AllSome Sequence Bloom Trees.

Sun, Chen; Harris, Robert S; Chikhi, Rayan; Medvedev, Paul.

J Comput Biol ; 25(5): 467-479, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-29620920

RESUMO

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

Assuntos

Algoritmos , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Sangue/metabolismo , Encéfalo/metabolismo , Mama/metabolismo , Feminino , Humanos , Transcriptoma

19.

RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly.

Rangavittal, Samarth; Harris, Robert S; Cechova, Monika; Tomaszkiewicz, Marta; Chikhi, Rayan; Makova, Kateryna D; Medvedev, Paul.

Bioinformatics ; 34(7): 1125-1131, 2018 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-29194476

RESUMO

Motivation: The haploid mammalian Y chromosome is usually under-represented in genome assemblies due to high repeat content and low depth due to its haploid nature. One strategy to ameliorate the low coverage of Y sequences is to experimentally enrich Y-specific material before assembly. As the enrichment process is imperfect, algorithms are needed to identify putative Y-specific reads prior to downstream assembly. A strategy that uses k-mer abundances to identify such reads was used to assemble the gorilla Y. However, the strategy required the manual setting of key parameters, a time-consuming process leading to sub-optimal assemblies. Results: We develop a method, RecoverY, that selects Y-specific reads by automatically choosing the abundance level at which a k-mer is deemed to originate from the Y. This algorithm uses prior knowledge about the Y chromosome of a related species or known Y transcript sequences. We evaluate RecoverY on both simulated and real data, for human and gorilla, and investigate its robustness to important parameters. We show that RecoverY leads to a vastly superior assembly compared to alternate strategies of filtering the reads or contigs. Compared to the preliminary strategy used by Tomaszkiewicz et al., we achieve a 33% improvement in assembly size and a 20% improvement in the NG50, demonstrating the power of automatic parameter selection. Availability and implementation: Our tool RecoverY is freely available at https://github.com/makovalab-psu/RecoverY. Contact: kmakova@bx.psu.edu or pashadag@cse.psu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Cromossomo Y , Algoritmos , Animais , Cromossomos de Mamíferos , Genômica/métodos , Gorilla gorilla/genética , Humanos , Masculino , Mamíferos

20.

Allergic asthma is distinguished by sensitivity of allergen-specific CD4+ T cells and airway structural cells to type 2 inflammation.

Cho, Josalyn L; Ling, Morris F; Adams, David C; Faustino, Lucas; Islam, Sabina A; Afshar, Roshi; Griffith, Jason W; Harris, Robert S; Ng, Aylwin; Radicioni, Giorgia; Ford, Amina A; Han, Andre K; Xavier, Ramnik; Kwok, William W; Boucher, Richard; Moon, James J; Hamilos, Daniel L; Kesimer, Mehmet; Suter, Melissa J; Medoff, Benjamin D; Luster, Andrew D.

Sci Transl Med ; 8(359): 359ra132, 2016 10 05.

Artigo em Inglês | MEDLINE | ID: mdl-27708065

RESUMO

Despite systemic sensitization, not all allergic individuals develop asthma symptoms upon airborne allergen exposure. Determination of the factors that lead to the asthma phenotype in allergic individuals could guide treatment and identify novel therapeutic targets. We used segmental allergen challenge of allergic asthmatics (AA) and allergic nonasthmatic controls (AC) to determine whether there are differences in the airway immune response or airway structural cells that could drive the development of asthma. Both groups developed prominent allergic airway inflammation in response to allergen. However, asthmatic subjects had markedly higher levels of innate type 2 receptors on allergen-specific CD4+ T cells recruited into the airway. There were also increased levels of type 2 cytokines, increased total mucin, and increased mucin MUC5AC in response to allergen in the airways of AA subjects. Furthermore, type 2 cytokine levels correlated with the mucin response in AA but not AC subjects, suggesting differences in the airway epithelial response to inflammation. Finally, AA subjects had increased airway smooth muscle mass at baseline measured in vivo using novel orientation-resolved optical coherence tomography. Our data demonstrate that the development of allergic asthma is dependent on the responsiveness of allergen-specific CD4+ T cells to innate type 2 mediators as well as increased sensitivity of airway epithelial cells and smooth muscle to type 2 inflammation.

Assuntos

Alérgenos/imunologia , Asma/imunologia , Hipersensibilidade/imunologia , Inflamação/imunologia , Inflamação/patologia , Células Th2/imunologia , Adulto , Asma/complicações , Asma/patologia , Citocinas , Humanos , Hipersensibilidade/complicações , Hipersensibilidade/patologia , Inflamação/complicações , Pulmão/patologia , Muco/metabolismo , Músculo Liso/imunologia , Músculo Liso/patologia , Fenótipo

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA